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ABSTRACT 

'Differences in sets of criteria for evaluating 
microcomputer software are discussed. They are set against the 
results of three studies in which teachers in the United Kingdom 
evaluated five programs which were used in reading or English 
lessons. A comparison of the checklist criteria with the case study 
data was made using Stake's (1967) matrix of evaluation concerns.. 
This suggested a heavy emphasis on antecedents in the checklists and 
on transactions in the case studies. In general, neither checklists 
nor case studies devoted great attention to empirically measured 
outcomes. A possible interpretation of the results is that while the 
checklists focussed on intrinsic evaluation, the case studies 
themselves focussed on practical classroom issues, notably attention 
and motivation. (Author) 
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Criteria for evaluating microcomputer software for reading 
development: observations based on three British case studies. 



Abstract 

Differences in sets of criteria for evaluating microcomputer 
software are discussed. They are set against the results of 
three studies in which UK teachers evaluated five programs which 
were used in reading or English lessons, A comparison of the 
checklist criteria with the case study data was made using 
Stake*s (1967) matrix of evaluation concerns. This suggested a 
heavy emphasis; on antecedents in the checklists and on 
transactions in the case studies. In general, neither checklists 
nor case studies devoted great attention to empirically measured 
outcomes, A possible interpretation of the results is that 
while the checklists' focussed on intrinsic evaluation, the case 
studies themselves focussed on practical classroom issues, 
notably attention and motivation. 



C riteria for evaluating mi croc omputer software fj3r readi ng 
development: obser vations based on three BrJ ti sji cas^ studies. 



The problem 

/ 

/ 

According to Lathrop (1982), the^critical evaluation of 
educational microcomputer programs /in the US has not kept pace 
with the proliferation of software packages, with reviews of less 
than 5 percent appearing in pri^ht. A further problem surrounds 
the issue of what criteria should be adopted for evaluation. An 
examination of five recentl^ published sets of guidelines for 
software evaluation (Jelderi, 1981; Golub, 1982; Devall, 1983; 
Burkhardt et al. , 1982; ^Adams and Jones, 1983) reveals that a 
number of different assumptions are made by specialists on both 
sides of the Atl ant i about what questions a software review 
shoul d address. * / 



Aims of the s tudy/ 



This paper set^ out to compare the evaluative assumptions built 
i\nto published sets of guidelines with those derived more 
di\^ectly fr^m three small UK case studies of microcomputer usage 
in readings/language classes. The case studies provide data on 
teacher and student reaction to five computer programs, each of 
whicf\ was used in a small^group context by children in the 9-13 
age r^nge. 
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The need for conceptual fr amework for comparing an^ 3il?JX?ijl9 
the guldel i nes 

As Robert ^ ^ is pointed out (1980), oversimplification 
obfuscates. heless, in seeking to compare five very 

different sets o lelines, some procedure for data reduction 
is essential. In th.s case it is proposed to use a variant of 
Stake*s own descrv;tion-judgment matrix (1967, 1980) in order to 
structure an ano^sis of the content of the lists^ Stake 
origi nal ly offered his matrix as an aid to e valuators who were 
devising a "shopping list" of what data to gather, and?tts seems 
worthwhile to apply it retrospectively in order to analyse and 
compare the issues and concerns which are i mpl i ^i t i n the 
checklists in the present study. This analysi s/ wi 1 1 be 'of 
interest i n i 1 lumi nati ng some of the areas of emphasi St and- 
omission.in the five sets of guidelines, but th^/Stake matrix 
will also be used for an analysis of the data of.the three case* 
study reports. The data collection and reporting for the case 
studies was carried out for the most part by non-specialists in 
evaluation. A comparison of the two matrices will therefore 
provide an indication of the extent to which there is a match 
betwen the issues and concerns in the guidelines and those which 
surface in the classroom. 

The original matrix consisted of a four-by-three array of cells: 
the horizontal axis was labelled intents, obs ervations, standards 
and judgments , while the vertical axis was labelled antecedents, 
transactions and outcomes . Stake also had a thirteenth tree- 
fioating box labelled rati onal e . The horizontal axis was 
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divided, with intent s and observations labelled as the 
description matrix , and standards and judgments labelled as the 
judgment matrix . This division suggested that Stake saw a sharp 
distinction between the- concepts of description and observation. 
In fact, he acknowledged in his later paper (1980) that while he 
felt that the matrix could still be useful in planning an 
..-evaTliat'ia<i^the concept of observa ti bns is an extremely broad 
one, and could, in certain circumstances, encompass intents, 
standards, judgments and statements of rationale. 

In the present paper, two sub-divisions of Stake's categories, 
i ntents and standards have been omitted. This has been done 
partly for clarity of presentation, but there is a^lso evidence 
that these two categories are relatively minor in comparison with 
the categories of observations and judgments. In Clift's (1981) 
study of checklists for whole-school evaluation, for example, 
observations and judgments accounted for 142 out of the 156.items 
recorded using the full matrix. 

It is perhaps worthwhile to offer a brief gloss on how the 
categories have been interpreted, since the issue of 
interpretation is both subjective and problematical. 
Antecedents , t ransacti ons and o utcome s have been interpreted 
as relating to observations or judgments which are made 
respectively before, during or after classroom activity. This 
might seem the obvious interpretation, but in fact others are 
possible: for example, the question 'Do the children work in 
groups while using this program?' might seem to be a 
straightforward question of observing a t ransacti on in the 



classroom. However, to a curriculum developer, having the 
children work in groups might be a desired outcome of the use of 
the program. . In the present analysis, however, the term 
outcomes is restricted to p^st^ hoc data, collected after the 
class'^oom session has ended. Equally, if an observation or 
judgment can be made before the classroom session. begins, it will 
be classified as an anteced ent. Thus a question such as 'Are 
there no more than three frames before a call for a response?' 
would be classed as an antecedent observation, since it could be 
answered in advance of the session by the teacher alone. 

The q.uestion of differentiating between observations and 
judgments can also be problematical. In many cases there is 
little doubt: 'Are supplementary materials provided?' would seem 
to be a question which can be resolved unequivocal ly by examining 
the package- an observation. In contrast, a question such as 
'Is the program free of pedagogical errors?' is hardly an issue 
which can bedecided uneqivocally through description or 
observation; it would therefore be classed as a judgment. 

As an example of a more difficult question to classify, one could 
consider the fol 1 owi ng: 'Is. the program logically crashproof?' 
In this case, the teacher .might try to answer the question by 
testing the program before the lesson. He or she might find a 
bug whi ch. causes the program to crash when certain keys are 
depressed- the question is unequivocally resolved- an 
observation , therefore. Suppose, however, no bug is found. In 
this case one could argue thatthe teacher has to make a 
judgment , and that the question is analoguous to 'Are all 



possible user errors trapped and help messages provided?', which 
would certainly seem to be' a difficult question to answer 
unequivocally. 

Perhaps the best solution to this problem would be to accept that 
the notions of observation and judgment are not dichotomous, but 
rather regions at opposite ends of a continuum. Thus, while 
there is bound to be a subjective element in classifying 
questions as matters of observation or judgment, it is only in 
the middle of the continuum that that subjectivity will lead to 
unreliable judgments, and this need not therefore invalidote the 
whol e. deci sion-maki ng procedure. 

An analysi s of the five checkl ists 

The five checklists described below were found as a result of a 
survey o^ the educational computing literature made in England in 
1983. The provenance of the checklists varied. The Adams and 
Jones list (1983, pp. 129-131) is given at the end^of a book on 
the place of the microcomputer in the humanities curriculum, and 
follows a statement in which the authors freely give their 
opinions on which educational publishers are producing worthwhile 
software and support materials. Burkhardt et aj^ (1982, pp.85- 
94), by contrast, take a much less partisan view, and offer their 
checklist as part of an'in-service pack designed to help teachers 
become more systemati c e va 1 uators of their own practice. The 
book emphasises the use of the microcomputer, but much of it 
would be appropriate for supporting formative and summative 
evaluation of other types of teaching material. 



Of the three US checklists, two appeared in widely-circulated 
journals- Deval Ts list (1983, p,553) appeared as an open letter 
in the Journal of Reading, whi le that of Golub (1982. pp, 28-29) 
appeared as part of an article in The Computing Teacher. 
Finally, delden's list (1983, p. 159) was reprinted from another 
source as part of an extensive annotated bibliography in a 
specialist book for reading teachers on computer applications in 
thei r subject. 

The items in the checklists were assigned to Staks's categories 
in the manner described above, and the result is shown in Table 1 
(see Appendix A for an annotated example of one of the check- 
lists). While it would be inappropriate to analyse tlie data too 
finely, a number of points may be made about differences between 



the checklists. Firstly, there is an overall weight of emphasis 
which in terms of number of it6ms gives 



Secondly, there is an overall emphasis, especially marked in the 
two UK studi es, of 



Looking more closelV^at the US lists, it is interesting to note 
the similarity between ^e^^l ists of Devall and Golub. Jelden. by 
contrast, provides the only\examp1e of a checklist in which 
observations outnumber judgmentsX 




What do these differences suggest in practical terms? In 
general, the emphasis on antecedent judgments perhaps reflects a 
wish to encourage an intrinsic e^valuation of the educational 



[Table 1 about here] 



antecedents ^ transactions ^ outcomes. 



judgments ^ observations. 




\ 



goals of the software, and to address pedagogical 'considerations 
such as whether the content is clearly organised and presented. 
The emphasis on antecedent obs ervati o ns, the second largest 
category .overal 1 , perhaps reflects a concern with technical 
considerations concerning the mechanics of use. 

In some respects, . this emphasis on antecedents is hardly 
surprising. Teachers usual ly have to make judgments about the 
likely worth of a program before they actually have an 
opportunity to try' it out in the classroom. Generally speaking, 
it is not commercially viable to make i nspecti on copies of 
software available: procedures for unlocking 'protected' 
software become common knowledge too rapidly. The authors of 
the checklists will have been well aware of this, and their 
guidelines therefore make few assumptions about the poss'ibility 
of any classroom-based evaluation. This offers a, pragmatic 
explanation for the emphasis on antecedents. We shall return 
later in this paper to the issue of precisely what interpretation 
should be put on an apparent lack of attention to trasnsactiohs 
and outcomes in four of the five checklists. Before that, 
however, it seems best to introduce and describe the main data 
source in this report, the three case studies. This will enable 
a contrastlve account to be attempted, and will permit a fuller 
discussion of the applicability of Stake's matrix. 

The three case studies 
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Case Study 1 was a dissertation completed as part of an in- 
service B.Ed, degree (Chan, 1983). It was based on an evaluation 
of two reading development programs, STURYBOARD and CLUES, both 
of which feature word deletion as a means of encouraging 
attentive reading and group discussion. STURYBOARD gives a 
totally deleted text, and information is available from prior 
exposure to the' passage and from proporticial length blanks which 
are given complete with punctuation; CLUES is a cloze-type 
exercise of the more familiar variety. In a crossover design, 
two groups of 6 students aged thirteen worked with both programs, 
using one of two specially selected short stories on each of the. 
programs. Their responses and reactions were recorded on sound 
tape during and^ after the two sessions of activity, and the 
students also completed a questionnaire and cloze reading 
comprehension post -tests. 

Case Study 2 reports the use of "Adventure Game" programs and an 
arcade game similar to "Pac-Man" in English lessons with a class 
of 25 twelve- to thirteen-year-olds. Over two six-week periods 
the students worked in small groups to produce either creative 
writing or a guide for other students who might wish to learn tjie 
strategies of each game. Two teachers worked with the class, 
and they kept a written record of their evaluation of the 
students' use of the microcomputers. 

Case Study 3 reports the results of a formative evaluation of 
WILT, a spelling game which gives students information about 
likely letter patterns in English. The program contains a 
matrix of bigram frequencies derived from an analysis of the 
prose of newspapers and novels; the student can call up 
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histograms showing how likely it is that any letter of the 
alphabet will be followed by any other. Data collection was 
carried out in six schools, three near London and three close to 
Nottingham. Data collection was based on classroom observation, 
a questionnaire for teachers, a discussion with, children and with 
the teachers, and unsolicited verbal or written comment. 

These were rather different types of study, and in seeking to 
systemise an approach to applying Stake's categories one faces 
some problems. In the event, Case Studies 2 and 3 were not too 
difficult to analyse: they each amounted to less than twelve 
pages of text, and a statement-by-statement rating of comments 
was not onerous. 

Case Study L, by contrast, was much more problematical. At 
which points in a dissertation can one be said to locate the 
statements which most define the concerns of ,the study? This 
was especially difficult in the present case since the whole 
topic was on the theme of evaluation. One o^'bvious cand.idate 
for analysis would presumably be the hypot hesis section.. On 
investigation, however, ijt was clear that there was a slight 
discontinuity between what was actually studied in some depth 
and what was highlighted in the hypotheses. Chan's hypotheses 
stressed those issues which were tested through cloze and reading 
comprehension, but they did not emphasise her interest in the 
transactions of the classroom, nor her intention to administer an 
attitude questionnaire. By contrast, however, in a section, 
titled iJlt£o^_cM^ ajTd s^^ Chan does give 

a list of the questions which the study attempts to explore, and 
this includes reference to both the quantitative and qualitative 



facets of her wark". Another section of the study which gives 
an i ndi cati on of her interestsas an evaluatoris the appendix, 
which includes a transcript of an interview with a group of 
children about the positive and negative aspects of. using 
microcomputers in school. 

After further consideration, therefore, it was decided to focus 
solely on these two aspects of Chan's study for the Stake 
analysis. In making this decision it was recogni sed that the 
issue of selection is complex, and one which might well have 
been approached differently. Thus, although her study totalled 
70 pages plus appendices, in the present analysis it yielded only 
nineteen items which were categorised using the Stake matrix. 

Appendix B gives an example of material from one of the case 
studies, together with an indication of how the statements were 
classified. . , 

Results of analysis of case study data 

The results of applying Stake's categories to the data in the 
case studies are shown in Table 2. As has already been noted, 
the decision to focus on two relatively limited sections of 
Chan's dissertation explains the comparatively small number of 

(Table 2 about here) 

items -relating to Case Study 1. The data for Case Studies 2 and 
3 are based on pooled results for two and six respondents' 
respectively, an'd i,t is perhaps worth noting that although the 
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individual results are not shown, there were in fact fairly 
simi lar distributions within each of the two groups. 

The main emphases shown in Table 2 are in the areas of 
t ransa c ti on obseryati ons , transaction judgment's, and ojjtcome 
judgmen ts. Together these account for 247 out of 295 statements 
analysed. Transaction observations were generally descriptions 
of student activity, such as 'pupils paid much more attention to 
the letter count" (Case Study 3, Respondent 4), or of teacher 
activity: 'I opened my mouth to shout "Tracey, don't shout!" but 
the word "Down!" came out instead.' (Case Study 2, Respondent 1). 
Transact ion judjments were generally opinions which led to 
tactical decisions during the lessons, or which were aspects of a 
formative evaluation of a program in action: *their enthusiasm 
was also noticeable and they needed a teacher to keep control' 
(Case Study 3, Respondent 1); 'They were just beginning to make 
interesting moves when their time was up' (Case Study 2, 
Respondent 2). Outcome judgments were generally part of a 
summative evaluation of the program, lesson, or associated 
coursework. These were opinions which were not substantiated by 
corroborative evidence: 'I thought that it (a piece of written 
work) lacked a certain realistic quality.' (Case Study 2, 
Respondent 2); 'With more appropriate words, I see no reason why 
less able reader's and younger children should not be able to use 
the program beneficially.' (Case Study 3, Respondent 6). 

After these three categories, the next largest .is that of 
antecedent judgments . In Case Study 1, the issues which were 
assigned to this category were all culled from the interview 
section in the appendix, e.g.: 'Do you think if you have learned 
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to use the computer at school it will be useful to you when you. 
leave?'; 'Do you think both boys a ad girls should learn to use 
the computer?'. In the other case studies, too, antecedent 
judgments tended to highlight issues related to intrinsic 
evaluation: 'There is a. danger that explicit language programs 
will lead back to the formal arid drill and practice language 
exercises which have now fallen into d'isrepute.' (Case Study 3, 
RespondentZ); 'I consider it to be a very nice program which 
seems to retain interest and has a true educational value.' (Case 
Study 3, Respondent 3). 

It is perhaps interesting to note that despite the practical 
nature of the three studies, the emphasis on empi ri cal ly- 
determined outcome data was patchy. The fact that there were 
only two items in Case Study 1 which came into the outcomes 
observation category should not be taken to iniply that post-tpst 
results were unimportant: in fact these points were the central 
questions about- the relationship between reading on the 
microcomputer and gains in comprehension. In Case Study 3, 
however, not a single reference is made by any of the six 
respondents to any empirical examination of whether chi 1 dren 
learn to'spell by using the program WILT. It is as if attention 
was focussed solely on intuitive assessments of motivation and 
task-oriented activity, which together with a consideration of 
the program's implicit educational philosophy formed the basis of 
the final evaluation. 

A comparison of the checklists arid case studies 

The aim of this paper is to compare the evaluative assumptions 



built into the five sets of guidelines with those distilled from 
the three case studies, and it is now possible to offer some 
comment on the differences between the two, drawing initially 
upon apparent differences in emphasis which are suggested by the 
Stake matrix analysis. For convenience, the totals of Tables 1 
and 2 have been reproduced alongside eachother.in Table. 3, and 
the results expressed in percentage form. 

(Table 3 about here) 

The most striking difference between the two sets of items in 
Table 3 is perhaps the relative salience of antecedents. If 
these are represented as they were earlier in terms of- greatest 
to least, the following pattern emerges: 

Checklists- antecedents ^ transactions ^ outcomes 

Case Studies- transactions^ outcomes]^ antecedents 

/ 

Antecedents shift from the dominant to the least dominant 
category, while in both groups transactions attract more 
attention than outcomes. So far as the observati on- judgment 
continuum is concerned,, judgments tend to outnumber observations 
in both checklists and case studies, with the exceptions of 
Jel den's checklist and the trans actions section of Case Study 2. 

* 

Discussion 

What do the kind of differences shown up in Table 3' relate to in 
real terms? Do the differences in emphasis between the 
guidelines and the checklists imply importantly different 
evaluative perspectives, or are the differences mere artefacts, 
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created by the application of some rather arbitrary decision 
procedures on a singularly amorphous set of data? jit has 
already been admitted that there is subjecti vi ty ^1 n the 
application of the Stake matrix to any dataset, but it has 
equally been argued that this. need not invalidate its use/ It 
has also- been pointed out that a strict quantitative approach to 
the numerical data would be inappropriate: the two occurences 
of outcome observation items in Case Study 3 referred to aspects 
of that study to which a great deal of attention was given. To 
apply inferential non-parametric statistics to this data would 
therefore be potentially misleading. Nevertheless, there 
remain a number of points which emerge from, the comparison of the 
checklists and case studies, and which are well worth 
consideration despite these caveats. To emphasise their 
tentativeness, the points will be expressed as questions:' 

Why do antecedents dominate the checklists? 

Is this an inevitable result of an agenda-setting operation? 

If it is, then why do Burkhardt elli 1^^^^ so many items in 

other categories? u 

What is- the significance of the apparent subordi'pat i on of 

antecedents J^n the case studies? 

Does this suggest an inattention to issues of intrinsic 
evaluation, or is attention to those issues masked by the 
crudeness of the matrix analysis? 

What is the significance of the apparent inattention to 
empirically-determined outcomes in both checklists and case 
studies? 
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The fact that Table 3 raises all these questions may in itself be 
regarded! as important. Quantification is not valuable in 
absolute terms, but only insofar as it performs a useful data 
reduction function, and draws attention to trends and patterns. 
In the present study, the type of material analysed included 
lists, S€!gments of oral discourse, a teacher*s lesson journal and 

o 

a formal evaluation report submitted to a publisher. These data 
are very different, and not easily compared one with another 
without ^ome systematic basis for analysis. 

A possib]e interpretation of the emphasis on observat i ona 1 and 
judgmej>t'al antecedents in the evaluation guidelines might be that 
teachers are enjoi ned to consider. the classroom potential of the 
software in terms of its mechanics of use and also its intrinsic 
educational merit. Equally, a possible interpretation of the 
emphases in the case studies on observational and judgmental 
transactions, and on judgmental outcomes might be that when 
teachers eval uate materi a 1 , thei r attenti on i s di rected by the 
exigencies of the classroom towards immediate and pragmatic 
concerns. In such conditions, concerns such as time on task, 
student motivation and cooperati on are 1 i ke ly to be much more 
dominant than either long-term pedagogical or philosophical 
issues. 



These interpretations are^open to debate, but it is suggested 

that*they are important enough to merit further discussion, and 

\ 

if the Stake analysis has helped to point up the issue, it has 
perhaps served a useful purpose. 
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Conclusions 



In the present study, sets of theoretical guidelines for 
focussing on evaluation issues have been compared with with the 
results of three practical studies in whi.ch evaluation issues are 
foregrounded and explored. It has been suggested that some 
potentially important differences of emphasis have emerged, and 
that in facilitating such comparisons, an analysis based on 
Stake's (1967) matrix can be of value, provided that it is used 
with ci rcumspection. 

In an area which is expanding so rapidly as that -of 
microcomputers in education there is an urgent need not only for 
evaluation, but for the assumptions built into evaluations and 
evaluation guidelines to be made clear. The results of this 
study suggest that the criteria of theoreticians and 
practitioners may differ in important ways, and that these 
possible differences should be further explored. 
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Table 1, Stake's cate;?ofies for evaluation applied to five checxclists 
for evaluatinj? microcomputer software. . 





Antecedents 

Obs. Judg. 


Transactions 

Obs. Jud^. 


1 Outcomes 
1 

Obs. Judg. 


Adams and 
Jones 


xxxxx xxxxxxxxxx 
xxxXxxxxxx 
xxxxxx 


X xxxxx 


X 


1 

[ Burichardt 
1 ex ax • 

i 


xxxxxxxxxx xxxxxxxxxx 
xxxxxxxxxx xxxxxxxxxx 

XXXXXXXXXX xxxxxxxxxx 
xxxxxxxx xxxxxxxxxx 
xxxxxxxxxx 
xxxxxxxxxx 
• xxxxxxxxxx 
xxxxxxxxxx 

X 


xxxxxxxxxx xxxxxxxxxx 
XXXXXXXXX xxxxxxxxxx 

ajuuljulajulx 

xxxxxxxx 


X XX 


Devail 


XXXXXXX XXXXXXXXX 






Golub 


xxxxxx xxxxxxxx 






Jelden 


xxxxxxxxxx XXX 

xxxxx 






Totals 
UK 
USA 

Overall 


55 116 
28 20 

81 156 


20 45 
0 0 

20 45 


1 5 

0 0 

1 5 
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Table 2> Stage's categories for evaluation aDplied to three case studies 
involving "the evaluation of microcomputer software in schools > 





Antecedents 




Transactions 




Outcomes 






Obs. 


Judg. 


Obs. 


Judg. 


Ocs. 


Judg. 


Case Study 1 


xxxxxx 


XX 


XX 


XXXXXXXXX 


Case Study 2 


xxxxxxx 


XXXXXXXXXX 
XXXXXXXXXX 
XXXXXXXXXX 
XXXXXXXXXX 
XXXXXXXXXX 


XXXXXXXXXX 
XXXXXXXXXX 
XXX 




XXXXXXXXXX 
XXXXXXXXXX 

xxxxx 


Case Study 5 


XX 


xxxxxxxxxx 
xxxxxxxxxx 

X 


XXXXXXXXXX 
XXXXXXXXXX 
X-CXXXXXXXX 

xxxxxxxx 


XXXXXXXXXX 
XXXXXXXXXX 
XXXXXXXXXX 
XXXXXXXXXX 
XXX 


f 

XXXXXXXXXX 
XXXXXXXXXX 
XXXXXXXXXX 
XXXXXXXXXX 

xxxxxxx 


Totals 


2 


54 


88 


68 


12 


91 
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Table 3* Cofiiparison oi' totals of evaluation observations and nudinnents 
in checklists and case studies. 





Antecedents 
Obs. J. 


Transactions 
Obs. J. 


Outcomes 
Obp. 


J. \ 


1 

1 

Total 


Checrclists 


81 


156 


20 


43 


i 


5 - 


284 


% 


29 


48 


7 


15 


0 


1 


\ i^io 


Case Studies 


2 


54 


88 


68 


12 


91 


'\ 295 


/O 


1 


11 ■ 


30 


23 


4 


51 


100 
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Appendix A 

Sample checklist with Stake categories added 
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AO 



/. Content 

i Is the content of educational value? AJ 

ii Is the content up to date and accurate? 

iii . Is there provision for adding and updating content material? AO 



IL Technical Cansideraxions 

i Is the package compatible with the computer(s) in use and any AD 
peripherals that are also needed? 

ii Is the program easy to load and does it run immediately on loading? 

iii Is the program capable of bemg used by students mdependently of a at 
teacher? 

iv Is the program reliable and 'crash proof in normal use? A J 



///. Pedagogical Considerations 

i Is the purpose of the pro gi im dearly defmed? Is it dear to the students? A J , T J 

ii Does the prognm allow snsdcnts to enter it at a variety of staning poina ^^j 
at diiferent levels? 

iii Is the presentsDon of the rontctir 

dev . AJ 
k>gical AJ 
ooosistciit? 

AJ 

IT Is there appcopriste use o£ 
colovxr 
sound 
graphics? 

Axe the compmer and VDU bang uied to htndle coioar» sound and 
graphics ipptu|MMidy in the da s Mo o m sinarioo? . 
▼ Does the piogiAui provide dugmoc help to es to snggest further ap* A J 

propnate •csTities for the snidait? 



AJ 
AJ 
AJ 



JV. Studax Appeal aid 'Uur-Fiimdlmm* 

i Is the program motivating to the age\range(s) for which it is intended? 0 J 

ii Does the program^ow for student interaaion and/or creativity? 0 J 
iii^- Is the progx^ one that givd the student adequate an^ m- 

about progress? 

iv Docs information about student error lead to 'prompts* so that the TO 

student can continue to proceed successfully with the program? 
V Can the student easily exit from the program so as to avoid the fnis* 

tration resulting from continued failure? 



TJ 



Appendix B 

Sample from Case Study 3 with Stake Categories added 



2-<|. per grcup. If 4 thsn one person tends to take charge,. 
2 tends to be z better number. 



The computer is going to be in the classroom, and use/in it 
so noise can get graring. Can it be turned dom? 
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TO 
TJ 



2. Disks more reliable than cassettes. AJ 

3. Cost is critical. AJ 

4. Must be fleicible. A J 

5. M.Ellis bad to explain to his class basic hangman clues. 

Every word has a vovel etc. This could possibly go to the ' qJ 
teacher's notes. 

6. The vocabulary is slightly too difficult for this 10/11 year 
eld group. However, .they did seec to be coping quite well. 

Must have the fzcility to put in ycur own vocabulary and to QJ 
link it with your own reading schemes should this be desired. 



OJ 



8. Word score' confusing. This needs to relate to the words tried TJ 
by any one person. Also, it is a cumulative scheme, and 
doesn't reflect the word just done. Eor examle the scoring 
can go lOCZ, CZ, 50Z, 66Z... 



OJ 



9. Letter score is useful, and reflects the pupils facility with 
words. It would be useful to have some feedback. - However, 
get away from percents and be far simpler. Say 'number of 
vords tried?* and ' niimiber of words achieved'. Also, for the 
letter count, this is better expressed as number of letters in 
the word 'is* and number of letters tried. 'is'. 

10. Histcgraphs are not always helpful/relevant. It would be more ' QJ 
valuable to gat children to pick out patterns in the English 
language. For example, vhat letters are likely tc go with 

' ia' , 'ai' , 'ci' , ' ti' , ' th' , prefixes and suffixes. 

11. Children use a dictionary to help with words. The teacher here TO 
found follow-up vord to find out the meaning useful. TJ 



